Skip to content

GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet#15101

Merged
pitrou merged 4 commits into
apache:masterfrom
wjones127:feat/parquet-string-bench
Jan 5, 2023
Merged

GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet#15101
pitrou merged 4 commits into
apache:masterfrom
wjones127:feat/parquet-string-bench

Conversation

@wjones127

@wjones127 wjones127 commented Dec 27, 2022

Copy link
Copy Markdown
Member

@wjones127 wjones127 changed the title GH-15100: [C++][Parquet] Add benchmark for reading strings from Parque GH-15100: [C++][Parquet] Add benchmark for reading strings from Parquet Dec 27, 2022
@github-actions

Copy link
Copy Markdown

@wjones127

Copy link
Copy Markdown
Member Author

@ursabot please benchmark command=cpp-micro --suite-filter=parquet-arrow-reader-writer-benchmark

@wjones127 wjones127 marked this pull request as ready for review December 28, 2022 16:10
@wjones127

Copy link
Copy Markdown
Member Author

@ursabot please benchmark command=cpp-micro --suite-filter=parquet-arrow-reader-writer-benchmark

@wjones127

Copy link
Copy Markdown
Member Author

@ursabot please benchmark

@ursabot

ursabot commented Dec 30, 2022

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = 6236dba and contender = 3c02495. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] test-mac-arm
[Finished ⬇️2.04% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️1.81% ⬆️0.14%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3c02495f ec2-t3-xlarge-us-east-2
[Failed] 3c02495f test-mac-arm
[Finished] 3c02495f ursa-i9-9960x
[Finished] 3c02495f ursa-thinkcentre-m75q
[Finished] 6236dbac ec2-t3-xlarge-us-east-2
[Failed] 6236dbac test-mac-arm
[Finished] 6236dbac ursa-i9-9960x
[Finished] 6236dbac ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot

ursabot commented Dec 30, 2022

Copy link
Copy Markdown

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

Comment thread cpp/src/parquet/arrow/reader_writer_benchmark.cc Outdated
Comment thread cpp/src/parquet/arrow/reader_writer_benchmark.cc
Comment thread cpp/src/parquet/arrow/reader_writer_benchmark.cc Outdated
::arrow::schema({::arrow::field("column", type, null_percentage > 0)}), {arr});
}

static void BM_WriteBinaryColumn(::benchmark::State& state) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it use the PLAIN encoding? Add a comment?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment near the parameters of each benchmark, explaining we are using the unique_values to trigger the code paths for dictionary and plain encodings. I tried to add a test within the benchmark to validate we are getting the expected encodings. But I found that it was too complicated, as the encodings can change from page to page and also apply to the definition and repetition levels (IIUC).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you just confirm that the expected encodings are used (and add a comment)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw the comment below, sorry. Please disregard. :-)

Comment thread cpp/src/parquet/arrow/reader_writer_benchmark.cc Outdated
Comment thread cpp/src/parquet/arrow/reader_writer_benchmark.cc Outdated
@wjones127 wjones127 requested a review from pitrou January 4, 2023 21:13
@pitrou pitrou merged commit 040310f into apache:master Jan 5, 2023
EpsilonPrime pushed a commit to EpsilonPrime/arrow that referenced this pull request Jan 5, 2023
… Parquet (apache#15101)

* Closes: apache#15100

Authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@ursabot

ursabot commented Jan 5, 2023

Copy link
Copy Markdown

Benchmark runs are scheduled for baseline = 25b5093 and contender = 040310f. 040310f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️8.15% ⬆️6.76%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.47% ⬆️0.17%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 040310fe ec2-t3-xlarge-us-east-2
[Failed] 040310fe test-mac-arm
[Finished] 040310fe ursa-i9-9960x
[Finished] 040310fe ursa-thinkcentre-m75q
[Finished] 25b50932 ec2-t3-xlarge-us-east-2
[Failed] 25b50932 test-mac-arm
[Finished] 25b50932 ursa-i9-9960x
[Finished] 25b50932 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@ursabot

ursabot commented Jan 5, 2023

Copy link
Copy Markdown

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm
ursa-i9-9960x

@ursabot

ursabot commented Jan 6, 2023

Copy link
Copy Markdown

['Python', 'R'] benchmarks have high level of regressions.
test-mac-arm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add benchmarks for reading and writing strings

3 participants